Conversation
…l and all backends Adds base_path property (returns tuple[str, ...]) and at() method stubs to ArrowDatabaseProtocol and all four backends (NoOpArrowDatabase, InMemoryArrowDatabase, ConnectorArrowDatabase, DeltaTableDatabase) to satisfy protocol conformance for ENG-341. Also renames DeltaTableDatabase's internal self.base_path (Path) instance attribute to self._local_base_path to avoid collision with the new tuple-typed property. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rowDatabase Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds full at() and base_path implementation: scoped views share the underlying _tables/_pending_batches/_pending_record_ids dicts by reference, _get_record_key prepends the path prefix, and _validate_record_path checks combined prefix+record_path depth. to_config/from_config round-trip the base_path field. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Full implementation of path scoping: `at()` returns a new instance sharing the connector and pending dicts by reference; `base_path` exposes `_path_prefix`; `_get_record_key` and `_get_committed_table` prepend the prefix so scoped views transparently read/write to prefixed SQL table names. `_validate_record_path` checks combined depth (prefix + path). `to_config` serializes `base_path`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements full database path contextualization for DeltaTableDatabase (ENG-341 Task 5): renames internal attributes (_base_uri→_root_uri, _local_base_path→_local_root), adds _path_prefix support, updates _get_table_uri to apply prefix, fixes _validate_record_path for combined depth checking, updates list_sources() to scan from scoped root, implements at() to return a properly scoped instance, and renames config key "base_path" (URI) to "root_uri" while adding "base_path" for the prefix tuple. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
Implements ENG-341 “database path contextualization” by introducing sub-scoped database views via at(*path_components) and a base_path tuple on ArrowDatabaseProtocol and all supported backends, plus tests and supporting design/plan docs.
Changes:
- Added
base_path: tuple[str, ...]andat(*path_components)toArrowDatabaseProtocoland implemented them for DeltaTable/InMemory/Connector/NoOp backends. - Updated config serialization to round-trip
base_pathacross backends, including renaming DeltaTableDatabase’s root location key from"base_path"→"root_uri". - Added/expanded unit tests for scoping behavior, config round-trips, and combined-depth validation.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/orcapod/protocols/database_protocols.py |
Extends the DB protocol with base_path and at() scoping API. |
src/orcapod/databases/noop_database.py |
Tracks base_path, implements at(), and includes base_path in config. |
src/orcapod/databases/in_memory_databases.py |
Implements scoped views by prefixing keys and sharing underlying dict storage. |
src/orcapod/databases/connector_arrow_database.py |
Implements scoped views by prefixing SQL table naming and sharing pending state. |
src/orcapod/databases/delta_lake_databases.py |
Implements scoped filesystem paths, updates depth validation, and renames config key to root_uri. |
tests/test_databases/test_noop_database.py |
Adds TestAtMethod coverage for NoOp scoping behavior. |
tests/test_databases/test_in_memory_database.py |
Adds TestAtMethod coverage + combined-depth validation for InMemory scoping. |
tests/test_databases/test_connector_arrow_database.py |
Adds TestAtMethod coverage including SQL table name prefixing. |
tests/test_databases/test_delta_table_database.py |
Adds scoped-path filesystem assertions and scoped list_sources() tests. |
tests/test_databases/test_database_config.py |
Updates Delta config expectations (root_uri) and adds base_path round-trip tests. |
superpowers/specs/2026-04-01-database-path-contextualization-design.md |
Design spec describing scoping semantics and per-backend implementation details. |
superpowers/plans/2026-04-01-database-path-contextualization.md |
Implementation plan documenting steps, testing strategy, and file map. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @classmethod | ||
| def from_config(cls, config: dict[str, Any]) -> DeltaTableDatabase: | ||
| def from_config(cls, config: dict[str, Any]) -> "DeltaTableDatabase": | ||
| """Reconstruct a DeltaTableDatabase from a config dict.""" | ||
| return cls( | ||
| base_path=config["base_path"], | ||
| base_path=config["root_uri"], | ||
| storage_options=config.get("storage_options"), | ||
| create_base_path=True, | ||
| batch_size=config.get("batch_size", 1000), | ||
| max_hierarchy_depth=config.get("max_hierarchy_depth", 10), | ||
| allow_schema_evolution=config.get("allow_schema_evolution", True), | ||
| _path_prefix=tuple(config.get("base_path", [])), | ||
| ) |
There was a problem hiding this comment.
DeltaTableDatabase.from_config now requires config["root_uri"], which will raise KeyError when loading configs produced before this change (where the filesystem root was stored under "base_path"). Since pipeline deserialization calls cls.from_config(config) directly, this is a backwards-compatibility break. Consider accepting both shapes: treat config.get("root_uri") as preferred, but fall back to config.get("base_path") when it’s a string/Path-like root, while still supporting the new list-valued base_path prefix tuple.
There was a problem hiding this comment.
Fixed in fa61441. from_config now checks for "root_uri" first (current format) and falls back to "base_path" as the URI string (legacy pre-ENG-341 format). When falling back, _path_prefix is set to (). Covered by test_from_config_accepts_legacy_base_path in TestDeltaTableDatabaseConfig.
| return DeltaTableDatabase( | ||
| base_path=self._root_uri, | ||
| storage_options=self._storage_options, | ||
| batch_size=self.batch_size, | ||
| max_hierarchy_depth=self.max_hierarchy_depth, | ||
| allow_schema_evolution=self.allow_schema_evolution, | ||
| _path_prefix=self._path_prefix + path_components, |
There was a problem hiding this comment.
DeltaTableDatabase.at() accepts arbitrary path_components but does not validate them. Since these components become part of filesystem paths, allowing values like ".." or strings containing separators can cause unexpected directory traversal / incorrect scoping. Suggest validating path_components with the same rules as record_path components (non-empty str; disallow '/', '\', '\0', etc.; and consider explicitly rejecting '.'/'..').
| return DeltaTableDatabase( | |
| base_path=self._root_uri, | |
| storage_options=self._storage_options, | |
| batch_size=self.batch_size, | |
| max_hierarchy_depth=self.max_hierarchy_depth, | |
| allow_schema_evolution=self.allow_schema_evolution, | |
| _path_prefix=self._path_prefix + path_components, | |
| validated_components: list[str] = [] | |
| for component in path_components: | |
| if not isinstance(component, str): | |
| raise TypeError( | |
| f"DeltaTableDatabase.at() path components must be str, got {type(component)!r}" | |
| ) | |
| if component == "" or "/" in component or "\\" in component or "\0" in component: | |
| raise ValueError( | |
| f"Invalid path component {component!r}: must be non-empty and must not contain '/', '\\\\', or NUL" | |
| ) | |
| if component in (".", ".."): | |
| raise ValueError( | |
| f"Invalid path component {component!r}: '.' and '..' are not allowed" | |
| ) | |
| validated_components.append(self._sanitize_path_component(component)) | |
| return DeltaTableDatabase( | |
| base_path=self._root_uri, | |
| storage_options=self._storage_options, | |
| batch_size=self.batch_size, | |
| max_hierarchy_depth=self.max_hierarchy_depth, | |
| allow_schema_evolution=self.allow_schema_evolution, | |
| _path_prefix=self._path_prefix + tuple(validated_components), |
There was a problem hiding this comment.
Fixed in fa61441. at() now validates each component before constructing the new instance: rejects non-str, empty string, '.'/'..', and the filesystem-unsafe char set (/, \\, *, ?, ", <, >, |, NUL). I validated but did not pre-sanitize what gets stored in _path_prefix — _get_table_uri already applies _sanitize_path_component at use time, keeping base_path predictable across platforms. Covered by test_at_rejects_slash_in_component, test_at_rejects_dotdot_component, test_at_rejects_empty_component, test_at_rejects_non_str_component in TestAtMethod.
| @property | ||
| def base_path(self) -> tuple[str, ...]: | ||
| """The current relative root of this database view (always () for root instances).""" | ||
| return self._path_prefix | ||
|
|
||
| def at(self, *path_components: str) -> "InMemoryArrowDatabase": | ||
| """Return a new InMemoryArrowDatabase scoped to the given sub-path. | ||
|
|
||
| The returned instance shares the underlying storage dicts (_tables, | ||
| _pending_batches, _pending_record_ids) by reference, so writes | ||
| through any view are visible to all views of the same root database. | ||
| """ | ||
| return InMemoryArrowDatabase( | ||
| max_hierarchy_depth=self.max_hierarchy_depth, | ||
| _path_prefix=self._path_prefix + path_components, | ||
| _shared_tables=self._tables, | ||
| _shared_pending_batches=self._pending_batches, | ||
| _shared_pending_record_ids=self._pending_record_ids, | ||
| ) |
There was a problem hiding this comment.
InMemoryArrowDatabase.at() builds _path_prefix directly from path_components without validating them. Because _get_record_key uses '/'.join(...) and flush reconstructs paths by splitting on '/', any path component containing '/' or '\0' will break key round-tripping and can corrupt reads/writes across views. Consider validating path_components (same checks as _validate_record_path applies to record_path components) before constructing the scoped view.
There was a problem hiding this comment.
Fixed in fa61441. at() now validates each component before building _path_prefix: rejects empty strings, non-str values, and components containing '/' or '\\0'. These are the same characters rejected by _validate_record_path for individual components. Covered by test_at_rejects_slash_in_component, test_at_rejects_null_in_component, test_at_rejects_empty_component in TestAtMethod.
| @property | ||
| def base_path(self) -> tuple[str, ...]: | ||
| """The current relative root of this database view (always () for root instances).""" | ||
| return self._path_prefix | ||
|
|
||
| def at(self, *path_components: str) -> "ConnectorArrowDatabase": | ||
| """Return a new ConnectorArrowDatabase scoped to the given sub-path. | ||
|
|
||
| The returned instance shares the connector and all three pending dicts | ||
| (_pending_batches, _pending_record_ids, _pending_skip_existing) by reference. | ||
| Calling flush() on any view drains the entire shared pending queue. | ||
| """ | ||
| return ConnectorArrowDatabase( | ||
| connector=self._connector, | ||
| max_hierarchy_depth=self.max_hierarchy_depth, | ||
| _path_prefix=self._path_prefix + path_components, | ||
| _shared_pending_batches=self._pending_batches, | ||
| _shared_pending_record_ids=self._pending_record_ids, | ||
| _shared_pending_skip_existing=self._pending_skip_existing, | ||
| ) |
There was a problem hiding this comment.
ConnectorArrowDatabase.at() stores path_components into _path_prefix without validation. Since record keys are built via '/'.join(...) and flush reconstructs record_path using split('/'), a scoped prefix component containing '/' or '\0' will break round-tripping and could cause data to be written/read under unintended table names. Consider validating path_components (non-empty str; disallow '/' and '\0') before returning the new scoped instance.
There was a problem hiding this comment.
Fixed in fa61441. Same validation as InMemoryArrowDatabase.at(): rejects empty strings and components containing '/' or '\\0'. Covered by test_at_rejects_slash_in_component, test_at_rejects_null_in_component, test_at_rejects_empty_component in TestAtMethod.
… support
- InMemoryArrowDatabase.at() and ConnectorArrowDatabase.at() now reject
components containing '/' or '\0', which would corrupt the '/'-separated
record key scheme and break flush()'s key reconstruction via split('/')
- DeltaTableDatabase.at() validates components against the full filesystem
unsafe-char set plus '.' and '..' (directory traversal)
- DeltaTableDatabase.from_config() now accepts both current format
('root_uri' key) and legacy pre-ENG-341 format ('base_path' as a URI
string), preventing KeyError when deserializing older persisted configs
- Tests added for all four changes
Addresses Copilot review on PR #123
Review round 1 addressed — fa61441All four Copilot comments resolved in a single commit:
Tests added
405 database tests pass. |
Summary
Implements ENG-341: database path contextualization — sub-scoped database views.
at(*path_components)andbase_path: tuple[str, ...]toArrowDatabaseProtocolDeltaTableDatabase,InMemoryArrowDatabase,ConnectorArrowDatabase,NoOpArrowDatabaseInMemoryArrowDatabaseandConnectorArrowDatabaseshare underlying storage by reference;DeltaTableDatabasecreates a fresh instance (filesystem is shared storage)to_config/from_configround-tripsbase_pathfor all backendsDeltaTableDatabaseconfig key rename:"base_path"(URI root) →"root_uri";"base_path"now carries the scoping prefix tuple_validate_record_pathupdated to check combinedlen(base_path) + len(record_path)depthlist_sources()on scopedDeltaTableDatabaseinstances scans from the scoped rootCloses ENG-341
Test plan
TestAtMethodtest classeslist_sources()scoping test for DeltaTable backend🤖 Generated with Claude Code